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(54) Structured documents on the WWW 



(57) A system for retrieving a selected page of a 
structured document and for automatically developing 
context information about the selected page. This con- 
text information may include a table of contents showing 
the location of the selected hypertext page in relation- 
ship to other hypertext pages. In one embodiment, this 
context information is inserted into the hypertext page. 
The so-modified hypertext page may then be transmit- 



ted to a remote location for display Since the context 
information is automatically developed after retrieval, it 
need not be manually generated and maintained. For 
WWW applications, the hypertext page with the context 
information inserted remains in the HTML format view- 
able by standard browsers. A powerful and convenient 
system for browsing through structured documents is 
thus provided. 
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D scription 

BACKGROUND OF THE INVENTION 

The present invention relates to the presentation of 5 
structured documents in a hypertext browsing system 
and more particularly to the presentation of context in- 
formation for a hypertext page. 

Most technical documentation incorporates a hier- 
archical structure of chapters, sections, subsections, 
etc. It is known that systems for on-line browsing of 
structured documents operate most effectively when the 
user can see where currently displayed information is 
located within the hierarchy. 

Accordingly structured document viewing interfac- 
es have been developed that display two panes, a first 
pane showing information desired by the user and a sec- 
ond pane showing a table of contents. The location of 
the information shown in the first pane is highlighted in 
the table of contents. The highlighted location must of 
course be updated every time new information is dis- 
played in the first pane. Since the table of contents must 
be generated in advance for each page of information 
displayable in the first pane, it is normally very simpli- 
fied, showing only the largest divisions of the structured 
document. 

It is desirable however to display context informa- 
tion for very large structured documents including many 
pages and to particularize the context information for 
each page. It is also desirable that the context informa- 
tion be more detailed than a simple list of the major di- 
visions of the structured document. Generating this in- 
formation manually presents various problems. The 
large number of pages means that many man-hours are 
required to generate the context information for each 
page. Furthermore, structured documents are frequent- 
ly updated with additions, deletions, and modifications 
of pages. These updates render previously generated 
table-of -contents information obsolete. 

Further problems arise in considering the presenta- 
tion of structured documents across the World Wide 
Web (WWW) or other network-based hypertext brows- 
ing environments. The operator of a web site storing a 
structured document cannot assume that all users are 
operating a hypertext browser that provides multiple 
pane displays. 

SUMMARY OF THE INVENTION 

By virtue of the present invention, a system is pro- 
vided for retrieving a selected page of a structured doc- 
ument and for automatically developing context infor- 
mation about the selected page. This context informa- 
tion may include a table of contents showing the location 
of the selected hypertext page in relationship to other 
hypertext pages. In one embodiment, this context infor- 
mation is inserted into the hypertext page. The so-mod- 
ified hypertext page may then be transmitted to a remote 



location for display. Since the context information is au- 
tomatically developed after retrieval, it need not be man- 
ually generated and maintained. For WWW applica- 
tions, the hypertext page with the context information 
inserted remains in the HTML format viewable by stand- 
ard browsers. The present invention thus provides a 
powerful and convenient system for browsing through 
structured documents. 

In one embodiment, the table-of-contents informa- 
tion is presented in a "fisheye" view at the top of the hy- 
pertext page. For a WWW application, when a user re- 
quests a page of the structured document, the system 
concatenates the HTML source for that page with a fish- 
eye view of the table-of-contents. The resulting string of 
HTML text is sent over a network to the user's web 
browser where it may be displayed. 

The table-of-contents display may include the 
names of other pages of the structured document. One 
aspect of the present invention provides many possible 
techniques for obtaining these names. For example, a 
table-of-contents database may be maintained includ- 
ing these names. The names could be retrieved from 
the HTML title or heading tags of the pages. Also, the 
filename of a page could be used as the name. 

In one embodiment, the structured document has a 
tree structure. A single root page has one or more chil- 
dren pages which in turn have one or more children and 
so on. The fisheye table-of-contents view shows the 
names of each parent of the presently displayed page 
up until the root page as well as the names of "sibling" 
pages, i.e., pages that share a common first generation 
parent. 

Another aspect of the present invention provides 
many techniques for determining the parents of a se- 
lected page. For example, the parents of a particular 
page may be determined by consulting a table-of-con- 
tents database. Alternatively, each page may include a 
special tag identifying at least its first generation parent. 
If the pages are stored within a hierarchical file system, 
the parent could be defined as the file stored in the same 
directory as the page with a special filename such as 
"index.html". If there is no such file, the parent directory 
of the directory holding the selected page is searched 
for such a file. If the parent directory has no such file, 
the first file in the parent directory having a particular 
suffix, such as ".html" is identified as the parent page. 
Otherwise, the selected page is determined to be the 
root page. 

A further understanding of the nature and advan- 
tages of the inventions herein may be realized by refer- 
ence to the remaining portions of the specification and 
the attached drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 A depicts a block diagram of a host computer 
system suitable for implementing the present invention. 
Fig. 1 B depicts the interconnection of the host com- 
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puter system to remote clients. 

Fig. 2 is a hypertext page display modified in ac- 
cordance with one embodiment of the present invention 
to include a "fisheye" table-of-contents. 

Fig. 3 is a segment of HTML code inserted to gen- 
erate the table-of-contents of Fig. 2. 

Fig. 4 is a flowchart describing steps of identifying 
a parent of a selected page of a structured document in 
accordance with one embodiment of the present inven- 
tion. 

Fig. 5 is a flowchart describing steps of obtaining a 
name of a selected page of a structured document in 
accordance with one embodiment of the present inven- 
tion. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 

Fig. 1 A depicts a block diagram of a host computer 
system 10 suitable for implementing the present inven- 
tion. Host computer system 10 includes a bus 12 which 
interconnects major subsystems such as a central proc- 
essor 14, a system memory 16 (typically RAM), an input/ 
output (I/O) controller 18, an external device such as a 
display screen 24 via display adapter 26, serial ports 28 
and 30, a keyboard 32, a storage interface 34, a floppy 
disk drive 36 operative to receive a floppy disk 38, and 
a CD-ROM player 40 operative to receive a CD-ROM 
42. Storage interface 34 may connect to a fixed disk 
drive 44. Fixed disk drive 44 may be a part of host com- 
puter system 10 or may be separate and accessed 
through other interface systems. Many other devices 
can be connected such as a mouse 46 connected via 
serial port 28 and a network interface 48 connected via 
serial port 30. Network interface 48 may provide a direct 
connection to a remote server via a telephone link or to 
the Internet via a POP (point of presence). Many other 
devices or subsystems (not shown) may be connected 
in a similar manner. 

Also, it is not necessary for all of the devices shown 
in Fig. 1 A to be present to practice the present invention, 
as discussed below. The devices and subsystems may 
be interconnected in different ways from that shown in 
Fig. 1A. The operation of a computer system such as 
that shown in Fig. 1 A is readily known in the art and is 
not discussed in detail in this application. Code to im- 
plement the present invention may be operably dis- 
posed or stored in computer-readable storage media 
such as system memory 16, fixed disk 44, CD-ROM 42, 
or floppy disk 38. 

Fig. 1B depicts the interconnection of host compu- 
ter system 10 to client systems 50, 52, and 54. Fig. IB. 
depicts the Internet 56 interconnecting client systems 
50, 52, and 54. Modem 48 or some other network inter- 
face provides the connection from host computer sys- 
tem 1 0 to the Internet 56. Protocols for exchanging data 
via the Internet are well known and need not be dis- 
cussed herein. Although Fig. 1B depicts the use of the 
I nternet for exchanging data, the present invention is not 



limited to the Internet or any network-based environ- 
ment for that matter. 

In one embodiment of the present invention, host 
computer system 10 has access to a structured docu- 

5 ment via storage interface 34. The structured document 
includes many pages, each typically stored in a sepa- 
rate file. For WWW applications, an HTTP server oper- 
ates on host computer system 10 and these files are 
typically in HTML format. The document has a tree 

10 structure with a single root page that has one or more 
child pages. Each child page in turn may have one or 
more children of its own. Thus, each page in the struc- 
tured document can trace its ancestry to the root through 
one or more parents. Many pages in the structured doc- 

15 ument may have siblings, that is other pages that share 
a common first generation parent. 

This structure is merely representative and other 
structures may be accommodated within the scope of 
the present invention. One could accommodate associ- 

20 ative structures with typed links between information ob- 
jects. For example, a geographic information structure 
might have links of the type "nearby" to indicate location 
and links of types "designed-by" and "has-designed" to 
connect buildings to architects and architects to build- 

25 ings. 

Client systems 50, 52, and 54 operate hypertext 
browsers configured to access host computer system 
10 over the Internet 18 and to retrieve selected pages 
of the structured document for local display. One aspect 
30 of the present invention provides automatic generation 
of context information at host system 10 for a selected 
page. The context information may then be inserted into 
the page prior to transmission to the requesting client 
system. 

35 One type of context information that may be provid- 
ed within the scope of the present invention is a so- 
called "fisheye" view of the table-of-contents of the 
structured document as explained below, A fisheye view 
is one that combines local detail with global context. Fig. 

40 2 is a hypertext page display 200 modified in accord- 
ance with one embodiment of the present invention to 
include a fisheye table-of-contents 202. Display 200 
shows the inventor's home page on the WWW which is 
one page in a structured document. 

45 Table-of-contents 202 includes a list of names in- 
cluding a name 204 of the currently displayed page, 
names 208 of parent pages of the currently displayed 
page, and names 210 of other pages having the same 
first generation parent as the currently displayed page, 

50 j.e,, sibling pages. Name 204 appears in bold print to 
signify that it represents the currently displayed page. 
Names 208 and 210 appear as highlighted links that 
when activated take the user to the identified pages. 
The indentation of entries in table-of-contents 204 

55 helps the user rapidly assess the relationship of the dis- 
played page to the overall document structure. "Sun Mi- 
crosystems 0 is the rootpag and thus appears at the far 
left margin. Each layer of the hierarchy is indented three 
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spaces more than the layer above it. The pages identi- 
fied by names 210 and name 204 share a common first 
generation parent and thus appear with the same de- 
gree of indentation. 

With this understanding of the indentation scheme 
in mind, it will be understood that the global context of 
the currently displayed page is apparent from table-of- 
contents 202 since the full chain of ancestry from the 
presently displayed page to the root page is shown. The 
names of the sibling pages clarify the local detail. Table- 
of -contents 202 is thus an extremely useful tool for un- 
derstanding the overall structure of the document and 
navigating th rough it. When the user shifts to a new page 
in the structured document, he or she sees an updated 
table-of-contents display that reflects the context of the 
newly displayed page. 

Table-of-contents 202 represents only one possible 
arrangement of context information within the scope of 
the present invention. For example, an alternative ar- 
rangement is to display only the chain of ancestry to the 
root and not the sibling pages. Another alternative ar- 
rangement is to display only a portion of the chain of 
ancestry but to also display all descendants of the sec- 
ond generation parent of the selected page. 

Depending on the structure of the document, other 
displays could be substituted for table-of-contents 202. 
For example, for the geographic information structure 
described above, when the user is viewing a hypertext 
page for a particular building, the present invention may 
provide a display listing other buildings near the location 
of the particular building as well as other buildings de- 
signed by the same architect. 

Host system 10 automatically generates table-of- 
contents 202 by concatenating appropriate HTML code 
with a requested page prior to transmission to the re- 
questing client. Fig. 3 shows a segment 300 of HTML 
code inserted to generate table-of-contents display 202 
of Fig. 2. Subsegments of segment 300 are identified 
with the reference designators of Fig. 2 pointing to the 
corresponding text produced for display. <PRE> tag 302 
and </PRE> tag 304 identify the HTML code of Fig. 3 
as being preformatted text. Name 204 denoting the cur- 
rently displayed page is marked for <STRONG> format- 
ting which typically appears as bold text. The remainder 
of the names are given within link anchor tags which in- 
clude URLs of the identified pages. An explanation of 
HTML format for encoding web pages is found in Morris, 
HTML for Fun and Profit, (SunSoft Press 1 995), the con- 
tents of which are herein expressly incorporated by ref- 
erence for ail purposes. 

In one embodiment, a document structure database 
accessible to host system 10 facilitates automatic gen- 
eration of the HTML code of Fig. 3. The document struc- 
ture database includes for each page, information about 
its name, information about its parent or the fact that it 
is a root page, and information about its children. 

Information about the parent and children of a page 
may also be embedded within a special HTML tag within 



the page. For example, the parent of a page may be 
indicated as <!-- META NAME= D parent" VALUE= n filena- 
me.htmP -->. If the page has a tag <!-- META 
NAME= u rootnode" VALUE= M thispage.html" -->, then it 

s has no parents. The children of a page may be indicated 
as <!-- META NAME=°child n VALUE="filenamel.html n 
->, <!-- META NAME= u child u VALUE= u filename2.htmt" 
-->, with each tag listed on a separate line. 

Fig. 4 is a flowchart describing steps of identifying 

10 a parent of a selected page of a structured document in 
accordance with one embodiment of the present inven- 
tion. At step 402, the document structure database is 
checked to see if the parent is identified there. If the par- 
ent is identified in the document structure database, the 

15 identity of the parent is extracted from the database at 
step 404. If the parent is not identified in the document 
structure database (or if there is no such database.avail- 
able), the selected page is scanned for the special ME- 
TA tag described above at step 406. If such a tag is avail- 

20 able, the identity of the parent is extracted from the tag 
at step 408. 

If such a tag is not available (or if the embodiment 
does not provide such tags), the next step 410 is to 
search for a file with a special filename, preferably "in- 

25 dex.htmP in the same directory as the selected page. 
This file is normally the master file of the directory (it 
typically contains home page or index information) and 
thus may serve as a parent. If such a file is found in the 
same directory as the selected page, this file is identified 

30 to be the parent at step 412, If such a file is not found, 
at step 414, the directory of the selected page is 
checked to see if it is in fact the root directory of the 
hypertext documents served by the HTTP server at host 
system 10. 

35 |f the directory of the selected page is in fact the root 
directory, the current page is determined at step 416 to 
have no parent page. If the directory of the selected 
page is not the root directory, searching for the parent 
page continues at step 418 where the parent directory 

40 of the directory containing the selected page is also 
checked for the file with the special filename, if this file 
is found, it is identified to be the parent of the selected 
page at step 420. If no such file is found, as a fallback 
the parent directory is searched for any file having a spe- 

45 cial suffix, preferably ".html", at step 422. If one or more 
such files are found in the parent directory, the one first 
in alphabetical order is identified to be the parent at step 
424. If no such file is found, the selected page is deter- 
mined to have no parent page at step 426. 

so The procedure of Fig. 4 is constructed to maximize 
the chances of identifying a page that can be said to 
represent the parent of the selected page even when 
the document structure is not precisely defined. The pro- 
cedure for identifying a child page is similar to steps 402 

55 through 406 of Fig. 4. Children are identified from the 
database if possible, and otherwise from the "child" me- 
ta tags if available or as the other files in the same di- 
rectory as the "index.htmr file. In this last case (other 
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files in the same directory), the files are scanned for the 
presence of a "parent" metatag. A file is considered to 
be a child if it either does not have a patent meta tag or 
has a "parent" meta tag with the value equal to the par- 
ent file. Once a first generation parent of a selected page 
is found, children of the first generation parent are iden- 
tified in this way to find the siblings of the selected page. 

Fig. 5 is a flowchart describing steps of obtaining a 
name of a selected page of a structured document in 
accordance with one embodiment of the present inven- 
tion. At step 502, the document structure database is 
searched for the name. If the name is found (or if there 
is no such database in the embodiment), it is extracted 
from the database at step 504. If the name is not found, 
the HTML source code for the selected page is searched 
for a <TITLE> tag at step 506. If the name is found in 
the <TITLE> tag, it is extracted at step 508. If the name 
is not found in the database, the HTML source is 
searched for a first level heading tag, i.e., an <HI> tag 
at step 51 0. If such a tag is found, the name is extracted 
from it at step 512. If such a tag is not found, the HTML 
source is searched for a heading tag of any level, i.e., 
an <Hn> tag at step 514. If any such tag is found, the 
user's name is extracted from the first one in the source 
at step 516. If no such tag is found, the selected page's 
file name is identified to be the name at step 518. This 
procedure maximizes the chances of obtaining a name 
that characterizes the contents of the page. 

Once the parents and siblings of the selected page 
are identified in accordance with Fig. 4 and their names 
obtained in accordance with Fig. 5, host system 10 may 
construct a code segment as in Fig. 3 and concatenate 
it to the HTML source of the selected page. Host system 
10 may then transmit the selected page to a requesting 
client where it is viewable by any HTML browser. Prior 
to transmission, host system 10 may remove any spe- 
cial META tags from the HTML source. 

In the foregoing specification, the invention has 
been described with reference to specific exemplary 
embodiments thereof. It will, however, be evident that 
various modifications and changes may be made there- 
unto without departing from the broader spirit and scope 
of the invention as set forth in the appended claims. 

Furthermore, the flowcharts described herein are il- 
lustrative of merely the broad logical flow of steps to 
achieve a method of the present invention and that steps 
may be added to, or taken away from the flowchart with- 
out departing from the scope of the invention. Further, 
the order of execution of steps in the flowcharts may be 
changed without departing from the scope of the inven- 
tion. Additional considerations in implementing the 
method described by the flowchart may dictate changes 
in the selection and order of steps. 

In general, the flowcharts in this specification in- 
clude one or more steps performed by software routines 
executing in a computer system. The routines may be 
implemented by any means as is known in the art. For 
example, any number of computer programming lan- 



guages, such as Java, "C", Pascal, FORTRAN, assem- 
bly language, etc., may be used. Further, various pro- 
gramming approaches such as procedural, object ori- 
ented or artificial intelligence techniques may be em- 
5 ployed. 

Many such changes or modifications will be readily 
apparent to one of ordinary skill in the art. For example, 
although the described embodiments refer to operation 
in the context of a network, the present invention will 

10 also find application when structured documents are 
stored and viewed on the same system. Even when im- 
plemented in the network context, the present invention 
is not limited to the WWW, or to HTML documents. The 
specification and drawings are, accordingly, to be re- 

*5 garded in an illustrative rather than a restrictive sense, 
the invention being limited only by the provided claims 
and their full scope of equivalents. 

20 Claims 

1. A computer-implemented method for presenting 
hypertext page context information comprising the 
steps of: 

25 

retrieving a selected hypertext page of a struc- 
tured document; and 

automatically developing information showing 
a context of said selected hypertext page within 
30 said structured document. 

2. The method of claim 1 wherein said structured doc- 
ument comprises a tree structure wherein each 
page descends from a root page through one or 

35 more parents and said automatically developing 
step comprises: 

automatically developing information identify- 
ing parents of said selected hypertext page. 

3. The method of claim 1 wherein said structured doc- 
ument comprises a tree structure wherein each 
page descends from a root page through one or 
more parents and said automatically developing 
step comprises: 

45 automatically developing information identify-, 

ing other hypertext pages descended from a parent 
of said selected hypertext page. 

4. The method of claim 1 wherein said automatically 
so developing step comprises: 

developing a name of another hypertext page 
within said structured document. 

5. The method of claim 4 wherein said name is ob- 
55 tained by: 

retrieving said name from a document struc- 
ture database of said structured document. 
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6. The method of claim 4 wherein said name is ob- 
tained by: 

extracting a title from said another hypertext 

page. 

7. The method of claim 4 wherein said name is ob- 
tained by: 

extracting a file name of said another hyper- 
text page. 

8. The method of claim 4 wherein said another hyper- 
text page is in HTML format and said name is ob- 
tained by: 

extracting said name from a heading tag in 
said another hypertext page. 

9. The method of claim 2 wherein said automatically 
developing step further comprises: 

extracting information identifying a parent of 
said selected hypertext page from a document 
structure database of said structured document. 

10. The method of claim 1 wherein said automatically 
developing step further comprises: 

extracting information identifying a parent of 
said selected hypertext page from a selected tag 
within said selected hypertext page. 

11. The method of claim 10 further comprising: 

removing said selected tag from said selected 
hypertext page. 

12. The method of claim 2 wherein said hypertext page 
is stored as a file within a directory structure and 
wherein said automatically developing step further 
comprises: 

searching for an index file within said directory 
structure beginning with a directory of said hy- 
pertext page file and moving up said directory 
structure until said index file is found; and 
identifying said index file to be a parent of said 
hypertext page. 

13. The method of claim 1 further comprising: 

inserting said information at the top of said se- 
lected hypertext page. 

1 4. The method of claim 1 3 further comprising the step 
of: 

transmitting said selected hypertext page as 
modified in said inserting step via a network. 

15. The method of claim 1 wherein said hypertext page 
is in HTML format. 

16. A computer system comprising: 



an lectronic storage interface system having 
access to a structured hypertext document; 
a network interface system that receives a re- 
quest for a selected page of said hypertext doc- 
s ument; 

a processing system that is configured to re- 
spond to said request by virtue of being config- 
ured to: 

10 retrieve said page via said electronic stor- 

age system; 

automatically develop information showing 
a context of said selected page within said 
structured document; 
15 insert said information in said selected 

page; and 

transmit said selected page including said 
context information via said network inter- 
face. 

20 

17. The computer system of claim 16 wherein said page 
is in HTML format. 

18. A computer program product for presenting hyper- 
25 text page context information, said product com- 
prising: 

code that retrieves a selected hypertext page 
of a structured document; 
30 code that automatically develops information 

showing a context of said selected hypertext 
page within said structured document; and 
a computer-readable medium that stores the 
codes. 

35 

19. The product of claim 18 wherein said structured 
document comprises a tree structure wherein each 
page descends from a root page through one or 
more parents and said automatically developing 

40 code comprises: 

code that automatically develops information 
identifying parents of said selected hypertext page. 

20. The product of claim 18 wherein said structured 
45 document comprises a tree structure wherein each 

page descends from a root page through one or 
more parents and said automatically developing 
code comprises: 

code that automatically develops information 
50 identifying other hypertext pages descended from 
a parent of said selected hypertext page. 

21 . The product of claim 1 8 wherein said automatically 
developing code comprises: 

55 code that obtains a name of another hypertext 

page within said selected document. 

22. The product of claim 21 wherein said name obtain- 
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ing code comprises: 

code that retrieves said name from a docu- 
ment structure database of said structured docu- 
ment. 

23. The product of claim 21 wherein said name obtain- 
ing code comprises: 

code that extracts a title from said another hy- 
pertext page. 

24. The product of claim 21 wherein said name obtain- 
ing code comprises: 

code that extracts a file name of said another 
hypertext page. 

25. The product of claim 21 wherein said another hy- 
pertext page is in HTML format and said name ob- 
taining code comprises: 

code that extracts a heading tag from said an- 
other hypertext page. 



32. The product of claim 18 wherein said hypertext 
page is in HTML format. 
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26. The product of claim 19 wherein said automatically 
developing code further comprises: 

code that extracts information identifying a 
parent of said selected hypertext page from a table- 25 
of-contents database of said structured document. 



27. The product of claim 19 wherein said automatically 
developing code further comprises: 

code that extracts information identifying a 30 
parent of said selected hypertext page from a se- 
lected tag within said selected hypertext page. 

28. The product of claim 27 further comprising: 

code that removes said selected tag from said 35 
selected hypertext page. 

29. The product of claim 19 wherein said hypertext 
page is stored as a file within a directory structure 
and wherein said automatically developing code 40 
further comprises: 

code that searches for an index file within said 
directory structure beginning with a directory of 
said hypertext page file and moving up said di- 45 
rectory structure until said index file is found; 
and 

code that identifies said index file to be a parent 
of said hypertext page. 

so 

30. The product of claim 18 further comprising: 

code that inserts said information at the top of 
said selected hypertext page. 

31 . The product of claim 30 further comprising: 55 

code that transmits said selected hypertext 
page as modified by said inserting code via a net- 
work. 
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